7 research outputs found

    A New Framework for Join Product Skew

    Full text link
    Different types of data skew can result in load imbalance in the context of parallel joins under the shared nothing architecture. We study one important type of skew, join product skew (JPS). A static approach based on frequency classes is proposed which takes for granted the data distribution of join attribute values. It comes from the observation that the join selectivity can be expressed as a sum of products of frequencies of the join attribute values. As a consequence, an appropriate assignment of join sub-tasks, that takes into consideration the magnitude of the frequency products can alleviate the join product skew. Motivated by the aforementioned remark, we propose an algorithm, called Handling Join Product Skew (HJPS), to handle join product skew

    Improved Methods for Extracting Frequent Itemsets from Interim-Support Trees

    Get PDF
    Mining association rules in relational databases is a significant computational task with lots of applications. A fundamental ingredient of this task is the discovery of sets of attributes (itemsets) whose frequency in the data exceeds some threshold value. In previous work [9] we have introduced an approach to this problem which begins by carrying out an efficient partial computation of the necessary totals, storing these interim results in a set-enumeration tree. This work demonstrated that making ∗ Aris Pagourtzis and Dora Souliou were partially supported for this research by “Pythagoras

    Community Detection via Neighborhood Overlap and Spanning Tree Computations

    Get PDF
    Most social networks of today are populated with several millions of active users, while the most popular of them accommodate way more than one billion. Analyzing such huge complex networks has become particularly demanding in computational terms. A task of paramount importance for understanding the structure of social networks as well as of many other real-world systems is to identify communities, that is, sets of nodes that are more densely connected to each other than to other nodes of the network. In this paper we propose two algorithms for community detection in networks, by employing the neighborhood overlap metric and appropriate spanning tree computations

    Frequent itemsets mining

    No full text

    Combining probabilistic neural networks and decision trees for maximally accurate and efficient accident prediction

    No full text
    The extent to which accident severity can be predicted from accident-related data collected at a variety of locations is investigated. The 2005 accident dataset brought together by the Republic of Cyprus Police is employed; this dataset comprises 1407 records of 43 continuous and categorical input parameters and a single categorical output parameter representing accident severity. No transformation of the database has been opted for, either by extracting the parameters that are significant for the prediction task or by modifying the records in any way (e.g. via record selection or transformation). Aiming at maximally accurate and efficient prediction, a combination of probabilistic neural networks (PNN's) and decision trees (DT's) is implemented: the simple training and direct operation of the PNN is complemented by the hierarchical, exhaustive and recursive construction of the DT. By training pairs of PNN's on data from the partitions derived from the minimal necessary number of top DT nodes, both efficiency and accident prediction accuracy are maximized. © 2010 IEEE

    Maximising Accuracy and Efficiency of Traffic Accident Prediction Combining Information Mining with Computational Intelligence Approaches and Decision Trees

    No full text
    The development of universal methodologies for the accurate, efficient, and timely prediction of traffic accident location and severity constitutes a crucial endeavour. In this piece of research, the best combinations of salient accident-related parameters and accurate accident severity prediction models are determined for the 2005 accident dataset brought together by the Republic of Cyprus Police. The optimal methodology involves: (a) information mining in the form of feature selection of the accident parameters that maximise prediction accuracy (implemented via scatter search), followed by feature extraction (implemented via principal component analysis) and selection of the minimal number of components that contain the salient information of the original parameters, which combined bring about an overall 74.42% reduction in the dataset dimensionality; (b)accidentseveritypredictionviaprobabilisticneuralnetworksandrandomforests,both of which independently accomplish over 96% correct prediction and a balanced proportionofunder-andover-estimationsofaccidentseverity. Anexplanationofthesuperiority of the optimal combinations of parameters and models is given, as is a comparison with existing accident classification/prediction approaches
    corecore